Computing Entropy for Ortholog Detection

نویسندگان

  • Hsing-Kuo Kenneth Pao
  • John Case
چکیده

Biological sequences from different species are called orthologs if they evolved from a sequence of a common ancestor species and they have the same biological function. Approximations of Kolmogorov complexity or entropy of biological sequences are already well known to be useful in extracting similarity information between such sequences — in the interest, for example, of ortholog detection. As is well known, the exact Kolmogorov complexity is not algorithmically computable. In practice one can approximate it by computable compression methods. However, such compression methods do not provide a good approximation to Kolmogorov complexity for short sequences. Herein is suggested a new approach to overcome the problem that compression approximations may not work well on short sequences. This approach is inspired by new, conditional computations of Kolmogorov entropy. A main contribution of the empirical work described shows the new set of entropy-based machine learning attributes provides good separation between positive (ortholog) and negative (non-ortholog) data — better than with good, previously known alternatives (which do not employ some means to handle short sequences well). Also empirically compared are the new entropy based attribute set and a number of other, more standard similarity attributes sets commonly used in genomic analysis. The various similarity attributes are evaluated by cross validation, through boosted decision tree induction C5.0, and by Receiver Operating Characteristic (ROC) analysis. The results point to the conclusion: the new, entropy based attribute set by itself is not the one giving the best prediction; however, it is the best attribute set for use in improving the other, standard attribute sets when conjoined with them. Keywords—compression, decision tree, entropy, ortholog, ROC.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Assessment Methodology for Anomaly-Based Intrusion Detection in Cloud Computing

Cloud computing has become an attractive target for attackers as the mainstream technologies in the cloud, such as the virtualization and multitenancy, permit multiple users to utilize the same physical resource, thereby posing the so-called problem of internal facing security. Moreover, the traditional network-based intrusion detection systems (IDSs) are ineffective to be deployed in the cloud...

متن کامل

A New Method for Root Detection in Minirhizotron Images: Hypothesis Testing Based on Entropy-Based Geometric Level Set Decision

In this paper a new method is introduced for root detection in minirhizotron images for root investigation. In this method firstly a hypothesis testing framework is defined to separate roots from background and noise. Then the correct roots are extracted by using an entropy-based geometric level set decision function. Performance of the proposed method is evaluated on real captured images in tw...

متن کامل

A New Method for Sperm Detection in Infertility Cure: Hypothesis Testing Based on Fuzzy Entropy Decision

In this paper, a new method is introduced for sperm detection in microscopic images for infertility treatment. In this method, firstly a hypothesis testing function is defined to separate sperms from plasma, non-sperm semen particles and noise. Then, some primary candidates are selected for sperms by watershed-based segmentation algorithm. Finally, candidates are either confirmed or rejected us...

متن کامل

Neural Network Based Protection of Software Defined Network Controller against Distributed Denial of Service Attacks

Software Defined Network (SDN) is a new architecture for network management and its main concept is centralizing network management in the network control level that has an overview of the network and determines the forwarding rules for switches and routers (the data level). Although this centralized control is the main advantage of SDN, it is also a single point of failure. If this main contro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004